NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Ricci curvature of attention maps and transformers training and robustness

Farzam, Amirhossein; Schlesinger, Oded; Susskind, Joshua M; DiMartino, Juan Matias; Sapiro, Guillermo (February 2026, NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations)

Transformer models have revolutionized machine learning, yet the underpinnings behind their success are only beginning to be understood. In this work, we analyze transformers through the geometry of attention maps, treating them as weighted graphs and focusing on Ricci curvature, a metric linked to spectral properties and system robustness. We prove that lower Ricci curvature, indicating lower system robustness, leads to faster convergence of gradient descent during training. We also show that a higher frequency of positive curvature values enhances robustness, revealing a trade-off between performance and robustness. Building on this, we propose a regularization method to adjust the curvature distribution and provide experimental results supporting our theoretical predictions while offering insights into ways to improve transformer training and robustness. The geometric perspective provided in our paper offers a versatile framework for both understanding and improving the behavior of transformers.
more » « less
Full Text Available
Topology-aware robust representation balancing for estimating causal effects

Farzam, Amirhossein; Aloui, Ahmed; Tarokh, Vahid; Sapiro, Guillermo (July 2025, NeurIPS 2025 High-dimensional Learning Dynamics Workshop)

Representation learning in high-dimensional spaces faces significant robustness challenges with noisy inputs, particularly with heavy-tailed noise. Arguing that topological data analysis (TDA) offers a solution, we leverage TDA to enhance representation stability in neural networks. Our theoretical analysis establishes conditions under which incorporating topological summaries improves robustness to input noise, especially for heavy-tailed distributions. Extending these results to representation-balancing methods used in causal inference, we propose the *Topology-Aware Treatment Effect Estimation* (TATEE) framework, through which we demonstrate how topological awareness can lead to learning more robust representations. A key advantage of this approach is that it requires no ground-truth or validation data, making it suitable for observational settings common in causal inference. The method remains computationally efficient with overhead scaling linearly with data size while staying constant in input dimension. Through extensive experiments with -stable noise distributions, we validate our theoretical results, demonstrating that TATEE consistently outperforms existing methods across noise regimes. This work extends stability properties of topological summaries to representation learning via a tractable framework scalable for high-dimensional inputs, providing insights into how it can enhance robustness, with applications extending to domains facing challenges with noisy data, such as causal inference.
more » « less
Full Text Available
SSOLE: Rethinking orthogonal low-rank embedding for self-supervised learning

Huang, Lun; Qiu, Qiang; Sapiro, Guillermo (April 2025, ICLR 2025)

Self-supervised learning (SSL) aims to learn meaningful representations from unlabeled data. Orthogonal Low-rank Embedding (OLE) shows promise for SSL by enhancing intra-class similarity in a low-rank subspace and promoting inter-class dissimilarity in a high-rank subspace, making it particularly suitable for multi-view learning tasks. However, directly applying OLE to SSL poses significant challenges: (1) the virtually infinite number of "classes" in SSL makes achieving the OLE objective impractical, leading to representational collapse; and (2) low-rank constraints may fail to distinguish between positively and negatively correlated features, further undermining learning. To address these issues, we propose SSOLE (Self-Supervised Orthogonal Low-rank Embedding), a novel framework that integrates OLE principles into SSL by (1) decoupling the low-rank and high-rank enforcement to align with SSL objectives; and (2) applying low-rank constraints to feature deviations from their mean, ensuring better alignment of positive pairs by accounting for the signs of cosine similarities. Our theoretical analysis and empirical results demonstrate that these adaptations are crucial to SSOLE’s effectiveness. Moreover, SSOLE achieves competitive performance across SSL benchmarks without relying on large batch sizes, memory banks, or dual-encoder architectures, making it an efficient and scalable solution for self-supervised tasks. Code is available at https://github.com/husthuaan/ssole.
more » « less
Full Text Available
Addressing misspecification in simulation-based inference through data-driven calibration

Wehenkel, Antoine; Gamella, Juan L; Sener, Ozan; Behrmann, Jens; Sapiro, Guillermo; Jacobsen, Jörn-Henrik; Cuturi, Marco (July 2025, ICML 2025)

Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification can compromise the reliability of SBI, preventing its adoption in important applications where only misspecified simulators are available. This work introduces robust posterior estimation~(RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground-truth parameter measurements. We formalize the misspecification gap as the solution of an optimal transport~(OT) problem between learned representations of real-world and simulated observations, allowing RoPE to learn a model of the misspecification without placing additional assumptions on its nature. RoPE demonstrates how OT and a calibration set provide a controllable balance between calibrated uncertainty and informative inference, even under severely misspecified simulators. Results on four synthetic tasks and two real-world problems with ground-truth labels demonstrate that RoPE outperforms baselines and consistently returns informative and calibrated credible intervals.
more » « less
Full Text Available
Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters

Zhu, Wei; Qiu, Qiang; Calderbank, Robert; Sapiro, Guillermo; Cheng, Xiuyuan (April 2022, Journal of machine learning research)

Encoding the scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many computer vision tasks especially when dealing with multiscale inputs. We study, in this paper, a scaling-translation-equivariant (ST-equivariant) CNN with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve equivariance for the regular representation of the scaling-translation group ST. To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation, a property which is theoretically analyzed and empirically verified. Numerical experiments demonstrate that the proposed scaling-translation-equivariant network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size.
more » « less
Full Text Available
Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters

Zhu, Wei; Qiu, Qiang; Calderbank, Robert; Sapiro, Guillermo; Cheng, Xiuyuan (January 2022, Journal of machine learning research)

Full Text Available
A Survey on Privacy from Statistical, Information and Estimation-Theoretic Views

https://doi.org/10.1109/MBITS.2021.3108124

Hsu, Hsiang; Martinezgil, Natalia Lucienne; Bertran, Martin; Sapiro, Guillermo; Calmon, Flavio (January 2022, IEEE BITS the Information Theory Magazine)

Full Text Available
Nested Learning for Multi-Level Classification

https://doi.org/10.1109/ICASSP39728.2021.9415076

Achddou, Raphael; Di Martino, J. Matias; Sapiro, Guillermo (June 2021, ICASSP 2021)
null (Ed.)
Full Text Available
Predictive Value of Early Autism Detection Models Based on Electronic Health Record Data Collected Before Age 1 Year

https://doi.org/10.1001/jamanetworkopen.2022.54303

Engelhard, Matthew M.; Henao, Ricardo; Berchuck, Samuel I.; Chen, Junya; Eichner, Brian; Herkert, Darby; Kollins, Scott H.; Olson, Andrew; Perrin, Eliana M.; Rogers, Ursula; et al (February 2023, JAMA Network Open)

Importance Autism detection early in childhood is critical to ensure that autistic children and their families have access to early behavioral support. Early correlates of autism documented in electronic health records (EHRs) during routine care could allow passive, predictive model-based monitoring to improve the accuracy of early detection. Objective To quantify the predictive value of early autism detection models based on EHR data collected before age 1 year. Design, Setting, and Participants This retrospective diagnostic study used EHR data from children seen within the Duke University Health System before age 30 days between January 2006 and December 2020. These data were used to train and evaluate L2-regularized Cox proportional hazards models predicting later autism diagnosis based on data collected from birth up to the time of prediction (ages 30-360 days). Statistical analyses were performed between August 1, 2020, and April 1, 2022. Main Outcomes and Measures Prediction performance was quantified in terms of sensitivity, specificity, and positive predictive value (PPV) at clinically relevant model operating thresholds. Results Data from 45 080 children, including 924 (1.5%) meeting autism criteria, were included in this study. Model-based autism detection at age 30 days achieved 45.5% sensitivity and 23.0% PPV at 90.0% specificity. Detection by age 360 days achieved 59.8% sensitivity and 17.6% PPV at 81.5% specificity and 38.8% sensitivity and 31.0% PPV at 94.3% specificity. Conclusions and Relevance In this diagnostic study of an autism screening test, EHR-based autism detection achieved clinically meaningful accuracy by age 30 days, improving by age 1 year. This automated approach could be integrated with caregiver surveys to improve the accuracy of early autism screening.
more » « less
Full Text Available
Rethinking Shape From Shading for Spoofing Detection

https://doi.org/10.1109/TIP.2020.3042082

Di Martino, J. Matias; Qiu, Qiang; Sapiro, Guillermo (January 2021, IEEE Transactions on Image Processing)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records